Since the beginning of the Landsat missions, the remote sensing community has been interested in developing universal algorithms for extracting water quality information from remotely sensed images [@Lots of old papers]. While there has been significant success in the oceanic community towards universal algorithms for chlorophyll, sediment, and doc [cites], there is no inland water equivalent. Much of this discrepancy comes from the increased optical complexity of inland waters, which prevents the use of a more universal algorithm, but progress on inland waters is further impeded by the lack of a shared dataset of overpasses and in situ concentration information. Here we create and share the largest such overpass dataset ever assembled. We also outline and share our approach to bringing three publicly available, free datasets to generate a high-graded analysis-ready dataset for remote sensors of water quality. While a specific universal algorithm may be an unattainable goal, we anticipate that this dataset will move us towards more universal approaches based on shared and equal access to overpass information.
Despite the long-recognized potential, until recently, the general hydrology and limnology communities have not integrated data from remote sensing of inland waters into our research approach [Topp]. Instead, these communities have focused much of our research on Eulerian sampling schemes with sensors or people repeatedly sampling the same points in a river or lake [DoyleEnsign]. This research approach has generated a wealth of information on temporal variability in inland waters, but there has been less work looking at spatial variability in rivers, lakes, and estuaries. Remote estimates of water quality in these ecosystems would allow for rapid assessment of potential algae blooms, detection of high-sediment waters, and analysis of spatio-temporal variability [cites].
Serious citation of Topp, maybe none of this at all?
With the profusion of publicly available in situ water quality datasets and the relatively easily-accessible satellite mission archive
| Satellite | Years | Available images |
|---|---|---|
| 5 | 1984-2012 | 192,688 |
| 7 | 1999-2018 | 188,781 |
| 8 | 2013-2018 | 58,585 |
For LAGOSNE data see here
Dataset generation
Distribution of observations across the conterminous USA. The data is split by observation type, where total represents an overpass for any of the four primary parameters
For both DOC and TSS our matchup dataset is missing the long tail of data in the in situ dataset. What kind of sites were dropped to create this discrepancy? They are basically all streams.
##
## Estuary Lake Stream
## 6 54 12665
## Reading layer `us_eco_l3' from data source `/home/matt/Dropbox/UNC-PostDocAll/aquasat/9_report/in/us_eco_l3/us_eco_l3.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1250 features and 13 fields
## geometry type: POLYGON
## dimension: XY
## bbox: xmin: -2356069 ymin: 272048.5 xmax: 2258225 ymax: 3172577
## epsg (SRID): NA
## proj4string: +proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=23 +lon_0=-96 +x_0=0 +y_0=0 +datum=NAD83 +units=m +no_defs